Submitted by:
ID1: 320983216 Name1: Maxim Kolchinsky
ID2: 321340580 Name2: Anna Romanov
The data for this lesson comes from:
> Saigi et al. “MET-Oncogenic and JAK2-Inactivating Alterations Are Independent Factors That Affect Regulation of PD-L1 Expression in Lung Cancer” PLoS ONE. 2018 Jun 13;9(6):e99625. PMID: 24926665.
Purpose: The blockade of immune checkpoints such as PDL1 and PD-1 is being exploited therapeutically in several types of malignancies. Here, we aimed to understand the contribution of the genetics of lung cancer to the ability of tumor cells to escape immunosurveillance checkpoints. Experimental Design: More than 150 primary non-small cell lung cancers, including pulmonary sarcomatoid carcinomas, were tested for levels of the HLA-I complex, PD-L1, tumor-infiltrating CD8þ lymphocytes, and alterations in main lung cancer genes. Correlations were validated in cancer cell lines using appropriate treatments to activate or inhibit selected pathways. We also performed RNA sequencing to assess changes in gene expression after these treatments. Results: MET-oncogenic activation tended to associate with positive PD-L1 immunostaining, whereas STK11 mutations were correlated with negative immunostaining. In MET-altered cancer cells, MET triggered a transcriptional increase of PD-L1 that was independent of the IFNgmediated JAK/STAT pathway. The activation of MET also upregulated other immunosuppressive genes (PDCD1LG2 and SOCS1) and transcripts involved in angiogenesis (VEGFA and NRP1) and in cell proliferation. We also report recurrent inactivating mutations in JAK2 that co-occur with alterations in MET and STK11, which prevented the induction of immunoresponse-related genes following treatment with IFNg. Conclusions: We show that MET activation promotes the expression of several negative checkpoint regulators of the immunoresponse, including PD-L1. In addition, we report inactivation of JAK2 in lung cancer cells that prevented the response to IFNg. These alterations are likely to facilitate tumor growth by enabling immune tolerance and may affect the response to immune checkpoint inhibitors
This data was downloaded from GEO (GSE:GSE109720)
library(readr)
library(dplyr)
library(ggplot2)
rawcounts <- read_csv("data/lung_counts.csv")
metadata <- read_csv("data/lung_metadata.csv")
There are 4 cell types in the dataset
The distribution of samples accross cell types is:
table(metadata$celltype)
##
## EBC1 H1573 H1993 H596
## 9 6 6 12
The number of expressed genes per sample:
vals<- data.frame('expressed_genes' = colSums(rawcounts[,-1]>0))
vals
ggplot(vals, aes(x=expressed_genes)) + geom_histogram(bins=15)
vals<- data.frame('highly_expressed_genes' = colSums(rawcounts[,-1]>1000))
vals
ggplot(vals, aes(x=highly_expressed_genes)) + geom_histogram(bins=15)
treatment1<- 'No treatment'
treatment2<- 'Crizotinib'
celltype0<- 'EBC1'
dex <- metadata$dex
filtered_metadata<- metadata %>% filter(dex==treatment1 | dex==treatment2 ) %>% filter(celltype==celltype0)
filtered_metadata
filtered_samples<- filtered_metadata$id
filtered_counts<- rawcounts %>% select(filtered_samples)
filtered_counts<- bind_cols(rawcounts[,'ensgene'],filtered_counts)
filtered_counts
library(DESeq2)
first we fix the row names:
filtered_countsRN <- data.frame(filtered_counts, row.names=1)
filtered_countsRN
filtered_metadataRN <- data.frame(filtered_metadata, row.names=1)
filtered_metadataRN
now we can construct the DESeq2 dataset
dds <- DESeqDataSetFromMatrix(countData=filtered_countsRN, colData=filtered_metadataRN, design=~dex)
dds
## class: DESeqDataSet
## dim: 58347 6
## metadata(1): version
## assays(1): counts
## rownames(58347): ENSG00000223972.5 ENSG00000227232.5 ...
## ENSG00000278625.1 ENSG00000277374.1
## rowData names(0):
## colnames(6): AE1160 AE1163 ... AE1165 AE1168
## colData names(3): celltype dex geo_id
We run the DESe12 pipline:
dds <- DESeq(dds)
first look at the resuts
res <- results(dds, tidy=TRUE)
res <- tbl_df(res)
res
Using a %>%, arrange the results by the adjusted p-value.
res <- res %>%
arrange(padj)
res
res_1 <- res %>% filter(padj<0.05) %>% nrow()
There 5264 are genes with p-value < 0.05
res_2 <- res %>% filter(padj<0.01) %>% nrow()
There 4001 are genes with p-value < 0.01
res %>%
filter(padj<0.95) %>%
write_csv("sigresults.csv")
gene <- "ENSG00000146072.6"
gene_name <- "ENSG00000146072.6"
plotCounts(dds, gene=gene, intgroup="dex",returnData=TRUE) %>% ggplot(aes(dex, count)) + geom_boxplot(aes(fill=dex)) + scale_y_log10() + ggtitle(gene_name)
# Create the new column
res <- res %>% mutate(sig=padj<0.95)
# How many of each?
res %>%
group_by(sig) %>%
summarize(n=n())
log2fold_above_2 = res %>% filter(log2FoldChange > 2) %>% nrow()
log2fold_below_minus_2 = res %>% filter(log2FoldChange < -2) %>% nrow()
res %>% ggplot(aes(baseMean, log2FoldChange, col=sig)) + geom_point() + scale_x_log10() + ggtitle("MA plot")
There are 573 with log fold change above 2
There are 654 with log fold change below -2
res %>% ggplot(aes(log2FoldChange, -1*log10(pvalue), col=sig)) + geom_point() + ggtitle("Volcano plot")
sessionInfo()The sessionInfo() prints version information about R and any attached packages. It’s a good practice to always run this command at the end of your R session and record it for the sake of reproducibility in the future.
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Israel.1252 LC_CTYPE=English_Israel.1252
## [3] LC_MONETARY=English_Israel.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Israel.1252
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] DESeq2_1.22.1 SummarizedExperiment_1.12.0
## [3] DelayedArray_0.8.0 BiocParallel_1.16.2
## [5] matrixStats_0.54.0 Biobase_2.42.0
## [7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1
## [9] IRanges_2.16.0 S4Vectors_0.20.1
## [11] BiocGenerics_0.28.0 bindrcpp_0.2.2
## [13] dplyr_0.7.8 readr_1.2.1
## [15] ggplot2_3.1.0 knitr_1.20
##
## loaded via a namespace (and not attached):
## [1] bit64_0.9-7 jsonlite_1.5 splines_3.5.1
## [4] Formula_1.2-3 assertthat_0.2.0 latticeExtra_0.6-28
## [7] blob_1.1.1 GenomeInfoDbData_1.2.0 yaml_2.2.0
## [10] RSQLite_2.1.1 pillar_1.3.0 backports_1.1.2
## [13] lattice_0.20-35 glue_1.3.0 digest_0.6.18
## [16] RColorBrewer_1.1-2 XVector_0.22.0 checkmate_1.8.5
## [19] colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-14
## [22] plyr_1.8.4 XML_3.98-1.16 pkgconfig_2.0.2
## [25] genefilter_1.64.0 zlibbioc_1.28.0 xtable_1.8-3
## [28] purrr_0.2.5 scales_1.0.0 annotate_1.60.0
## [31] tibble_1.4.2 htmlTable_1.12 withr_2.1.2
## [34] nnet_7.3-12 lazyeval_0.2.1 survival_2.42-3
## [37] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
## [40] evaluate_0.12 foreign_0.8-70 tools_3.5.1
## [43] data.table_1.11.8 hms_0.4.2 stringr_1.3.1
## [46] locfit_1.5-9.1 munsell_0.5.0 cluster_2.0.7-1
## [49] AnnotationDbi_1.44.0 compiler_3.5.1 rlang_0.3.0.1
## [52] grid_3.5.1 RCurl_1.95-4.11 rstudioapi_0.8
## [55] htmlwidgets_1.3 bitops_1.0-6 base64enc_0.1-3
## [58] labeling_0.3 rmarkdown_1.11 gtable_0.2.0
## [61] codetools_0.2-15 DBI_1.0.0 R6_2.3.0
## [64] gridExtra_2.3 bit_1.1-14 bindr_0.1.1
## [67] Hmisc_4.1-1 stringi_1.2.4 Rcpp_1.0.0
## [70] geneplotter_1.60.0 rpart_4.1-13 acepack_1.4.1
## [73] tidyselect_0.2.5